---
id: synthesizer-introduction
title: Introduction to Synthetic Data Generation
sidebar_label: Introduction
---

<head>
  <link
    rel="canonical"
    href="https://deepeval.com/docs/synthesizer-introduction"
  />
</head>

import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";

`deepeval`'s `Synthesizer` offers a fast and easy way to generate high-quality **single and multi-turn goldens** for your evaluation datasets in just a few lines of code. This is especially helpful if:

- You don't have an evaluation dataset to start with
- You have a small dataset and wish to augment it with existing examples
- You have a knowledge base and want to create a dataset out of it

:::note
For single-turn generations, note that `deepeval`'s `Synthesizer` does **NOT** generate `actual_output`s for each golden. This is because `actual_output`s are meant to be generated by your LLM (application), not `deepeval`'s synthesizer.

For multi-turn generations, `deepeval`'s `Synthesizer` also does not generation `turns`. Instead, you should go to the [`ConversationSimulator`](/docs/conversation-simulator) instead for the simulation of `turns`.
:::

<details>
<summary>Should you generate synthetic datasets?</summary>

Synthesizing evaluation data is especially helpful if you don't have a prepared evaluation dataset, as it will **help you generate the initiate testing data you need** to get up and running with evaluation.

However, you should aim to manually inspect and edit any synthetic data where possible.

</details>

## Quick Summary

The `Synthesizer` uses an LLM to first generate a series of inputs/scenarios, before evolving them to become more complex and realistic. These evolved inputs/scenarios are then used to create a list of synthetic goldens, which can be single or multi-turn and makes up your synthetic `EvaluationDataset`.

To begin generating goldens, paste in the following code:

<Tabs groupId="single-multi-turns">

<TabItem value="single-turn" label="Single-Turn">

```python title="main.py"
from deepeval.synthesizer import Synthesizer

synthesizer = Synthesizer()
goldens = synthesizer.generate_goldens_from_docs(
    document_paths=['example.txt'], # Replace with your file
    include_expected_output=True
)
print(goldens)
```

</TabItem>

<TabItem value="multi-turn" label="Multi-Turn">

```python title="main.py"
from deepeval.synthesizer import Synthesizer

synthesizer = Synthesizer()
conversational_goldens = synthesizer.generate_conversational_goldens_from_docs(
    document_paths=['example.txt'], # Replace with your file
    include_expected_outcome=True
)
print(conversational_goldens)
```

</TabItem>

</Tabs>

```bash
python main.py
```

Congratulations 🎉🥳! You've just generated your first set of synthetic goldens.

:::info
`deepeval`'s `Synthesizer` uses the data evolution method to generate large volumes of data across various complexity levels to make synthetic data more realistic. This method was originally introduced by the developers of [Evol-Instruct and WizardML.](https://arxiv.org/abs/2304.12244)

For those interested, here is a [great article on how `deepeval`'s synthesizer was built.](https://www.confident-ai.com/blog/the-definitive-guide-to-synthetic-data-generation-using-llms)
:::

## Create Your First Synthesizer

To start generating goldens for your `EvaluationDataset`, begin by creating a `Synthesizer` object:

```python
from deepeval.synthesizer import Synthesizer

synthesizer = Synthesizer()
```

There are **SEVEN** optional parameters when creating a `Synthesizer`:

- [Optional] `async_mode`: a boolean which when set to `True`, enables **concurrent generation of goldens**. Defaulted to `True`.
- [Optional] `model`: a string specifying which of OpenAI's GPT models to use for generation, **OR** [any custom LLM model](/docs/metrics-introduction#using-a-custom-llm) of type `DeepEvalBaseLLM`. Defaulted to `gpt-4.1`.
- [Optional] `max_concurrent`: an integer that determines the maximum number of goldens that can be generated in parallel at any point in time. You can decrease this value if you're running into rate limit errors. Defaulted to `100`.
- [Optional] `filtration_config`: an instance of type `FiltrationConfig` that allows you to [customize the degree of which goldens are filtered](#filtration-quality) during generation. Defaulted to the default `FiltrationConfig` values.
- [Optional] `evolution_config`: an instance of type `EvolutionConfig` that allows you to [customize the complexity of evolutions applied](#evolution-complexity) during generation. Defaulted to the default `EvolutionConfig` values.
- [Optional] `styling_config`: an instance of type `StylingConfig` that allows you to [customize the styles and formats](#styling-options) of generations. Defaulted to the default `StylingConfig` values.
- [Optional] `cost_tracking`: a boolean which when set to `True`, will print the cost incurred by your LLM during golden synthesization.

:::note
The `filtration_config`, `evolution_config`, and `styling_config` parameter allows you to customize the goldens being generated by your `Synthesizer`.

In addition, the `model` for your `Synthesizer` will automatically be used for the `critic_model`s of the [`FiltrationConfig`](#filtration-quality) and [`ContextConstructionConfig`](/docs/synthesizer-generate-from-docs#customize-context-construction) **if the respective custom config instances are not provided**.
:::

## Generate Your First Golden

Once you've created a `Synthesizer` object with the desired filtering parameters and models, you can begin generating goldens.

<Tabs groupId="single-multi-turns">

<TabItem value="single-turn" label="Single-Turn">

```python
from deepeval.synthesizer import Synthesizer

synthesizer = Synthesizer()
goldens = synthesizer.generate_goldens_from_docs(
    document_paths=['example.txt', 'example.docx', 'example.pdf', 'example.md', 'example.markdown', 'example.mdx'],
    include_expected_output=True
)
print(goldens)
```

In this example, we've used the `generate_goldens_from_docs` and `generate_conversational_goldens_from_docs` methods, which are two of the four generation methods offered by `deepeval`'s `Synthesizer`. The four methods include:

- [`generate_goldens_from_docs()`](/docs/synthesizer-generate-from-docs): useful for generating goldens to evaluate your LLM application based on contexts extracted from your knowledge base in the form of documents.
- [`generate_goldens_from_contexts()`](/docs/synthesizer-generate-from-contexts): useful for generating goldens to evaluate your LLM application based on a list of prepared context.
- [`generate_goldens_from_scratch()`](/docs/synthesizer-generate-from-scratch): useful for generating goldens to evaluate your LLM application without relying on contexts from a knowledge base.
- [`generate_goldens_from_goldens()`](/docs/synthesizer-generate-from-goldens): useful for generating goldens by augmenting a known set of goldens.

:::tip
You might have noticed the `generate_goldens_from_docs()` is a superset of `generate_goldens_from_contexts()`, and `generate_goldens_from_contexts()` is a superset of `generate_goldens_from_scratch()`.

This implies that if you want more control over context extraction, you should use `generate_goldens_from_contexts()`, but if you want `deepeval` to take care of context extraction as well, use `generate_goldens_from_docs()`.
:::

</TabItem>

<TabItem value="multi-turn" label="Multi-Turn">

```python
from deepeval.synthesizer import Synthesizer

synthesizer = Synthesizer()
conversational_goldens = synthesizer.generate_conversational_goldens_from_docs(
    document_paths=['example.txt', 'example.docx', 'example.pdf', 'example.md', 'example.markdown', 'example.mdx'],
    include_expected_outcome=True
)
print(conversational_goldens)
```

In this example, we've used the `generate_goldens_from_docs` and `generate_conversational_goldens_from_docs` methods, which are two of the four generation methods offered by `deepeval`'s `Synthesizer`. The four methods include:

- [`generate_conversational_goldens_from_docs()`](/docs/synthesizer-generate-from-docs): useful for generating goldens to evaluate your LLM application based on contexts extracted from your knowledge base in the form of documents.
- [`generate_conversational_goldens_from_contexts()`](/docs/synthesizer-generate-from-contexts): useful for generating goldens to evaluate your LLM application based on a list of prepared context.
- [`generate_conversational_goldens_from_scratch()`](/docs/synthesizer-generate-from-scratch): useful for generating goldens to evaluate your LLM application without relying on contexts from a knowledge base.
- [`generate_conversational_goldens_from_goldens()`](/docs/synthesizer-generate-from-goldens): useful for generating goldens by augmenting a known set of goldens.

:::tip
You might have noticed the `generate_conversational_goldens_from_docs()` is a superset of `generate_conversational_goldens_from_contexts()`, and `generate_conversational_goldens_from_contexts()` is a superset of `generate_conversational_goldens_from_scratch()`.

This implies that if you want more control over context extraction, you should use `generate_conversational_goldens_from_contexts()`, but if you want `deepeval` to take care of context extraction as well, use `generate_conversational_goldens_from_docs()`.
:::

</TabItem>

</Tabs>

Once generation is complete, you can also convert your synthetically generated goldens into a DataFrame:

```python
dataframe = synthesizer.to_pandas()
print(dataframe)
```

Here's an example of what the resulting DataFrame might look like for a single-turn generation:

| <div style={{width: "200px"}}>input</div>      | actual_output | expected_output | <div style={{width: "280px"}}>input</div>                             | retrieval_context | n_chunks_per_context | context_length | context_quality | synthetic_input_quality | evolutions | source_file |
| ---------------------------------------------- | ------------- | --------------- | --------------------------------------------------------------------- | ----------------- | -------------------- | -------------- | --------------- | ----------------------- | ---------- | ----------- |
| Who wrote the novel "1984"?                    | None          | George Orwell   | ["1984 is a dystopian novel published in 1949 by George Orwell."]     | None              | 1                    | 60             | 0.5             | 0.6                     | None       | file1.txt   |
| What is the boiling point of water in Celsius? | None          | 100°C           | ["Water boils at 100°C (212°F) under standard atmospheric pressure."] | None              | 1                    | 55             | 0.4             | 0.9                     | None       | file2.txt   |
| ...                                            | ...           | ...             | ...                                                                   | ...               | ...                  | ...            | ...             | ...                     | ...        | ...         |

And that's it! You now have access to a list of synthetic goldens generated using information from your knowledge base.

## Save Your Synthetic Dataset

### On Confident AI

To avoid losing any generated synthetic `Goldens`, you can push a dataset containing the generated goldens to Confident AI:

```python
from deepeval.dataset import EvaluationDataset
...

dataset = EvaluationDataset(goldens=synthesizer.synthetic_goldens)
dataset.push(alias="My Generated Dataset")
```

This keeps your dataset on the cloud and you'll be able to edit and version control it in one place. When you are ready to evaluate your LLM application using the generated goldens, simply pull the dataset from the cloud like how you would pull a GitHub repo:

```python
from deepeval import evaluate
from deepeval.dataset import EvaluationDataset
from deepeval.metrics import AnswerRelevancyMetric
...

dataset = EvaluationDataset()
# Same alias as before
dataset.pull(alias="My Generated Dataset")
evaluate(dataset, metrics=[AnswerRelevancyMetric()])
```

### Locally

Alternatively, you can use the `save_as()` method to save synthetic goldens locally:

```python
synthesizer.save_as(
    # Type of file to save ('json' or 'csv')
    file_type='json',
    # Directory where the file will be saved
    directory="./synthetic_data"
)
```

The `save_as()` method supports the following parameters:

- `file_type`: Specifies the format to save the data ('json' or 'csv')
- `directory`: The folder path where the file will be saved
- `file_name`: Optional custom filename without extension - when provided, the file will be saved as "{file_name}.{file_type}"
- `quiet`: Optional boolean to suppress output messages about the save location

By default, the method generates a timestamp-based filename (e.g., "20240523_152045.json"). When you provide a custom filename with the `file_name` parameter, that name is used as the base filename and the extension is added according to the `file_type` parameter.

For example, if you specify `file_type='json'` and `file_name='my_dataset'`, the file will be saved as "my_dataset.json".

```python
# Save as JSON with a custom filename my_dataset.json
synthesizer.save_as(
    file_type='json',
    directory="./synthetic_data",
    file_name="my_dataset"
)

# Save as CSV with a custom filename my_dataset.csv
synthesizer.save_as(
    file_type='csv',
    directory="./synthetic_data",
    file_name="my_dataset"
)
```

:::caution
Note that `file_name` should not contain any periods or file extensions, as these will be automatically added based on the `file_type` parameter.
:::

## Customize Your Generations

`deepeval`'s `Synthesizer`'s generation pipeline is made up of several components, which you can easily customize to determine the quality and style of the resulting generated goldens.

:::tip
You might find it useful to first [learn about all the different components and steps that make up the `Synthesizer` generation pipeline](#how-does-it-work).
:::

### Filtration Quality

You can customize the degree of which generated goldens are filtered away to ensure the quality of synthetic inputs by instantiating the `Synthesizer` with a `FiltrationConfig` instance.

```python
from deepeval.synthesizer import Synthesizer
from deepeval.synthesizer.config import FiltrationConfig

filtration_config = FiltrationConfig(
  critic_model="gpt-4.1",
  synthetic_input_quality_threshold=0.5
)

synthesizer = Synthesizer(filtration_config=filtration_config)
```

There are **THREE** optional parameters when creating a `FiltrationConfig`:

- [Optional] `critic_model`: a string specifying which of OpenAI's GPT models to use to determine context `quality_score`s, **OR** [any custom LLM model](/docs/metrics-introduction#using-a-custom-llm) of type `DeepEvalBaseLLM`. Defaulted to the **model used in the `Synthesizer`**, else `gpt-4.1` when initialized as a standalone instance.
- [Optional] `synthetic_input_quality_threshold`: a float representing the minimum quality threshold for synthetic input generation. Inputs with `quality_score`s lower than the `synthetic_input_quality_threshold` will be rejected. Defaulted to `0.5`.
- [Optional] `max_quality_retries`: an integer that specifies the number of times to retry synthetic input generation if it does not meet the required quality. Defaulted to `3`.

If the `quality_score` is still lower than the `synthetic_input_quality_threshold` after `max_quality_retries`, the golden with the highest `quality_score` will be used.

### Evolution Complexity

You can customize the evolution types and depth applied by instantiating the `Synthesizer` with an `EvolutionConfig` instance. You should customize the `EvolutionConfig` to vary the complexity of the generated goldens.

```python
from deepeval.synthesizer import synthesizer
from deepeval.synthesizer.config import EvolutionConfig

evolution_config = EvolutionConfig(
    evolutions={
        Evolution.REASONING: 1/4,
        Evolution.MULTICONTEXT: 1/4,
        Evolution.CONCRETIZING: 1/4,
        Evolution.CONSTRAINED: 1/4
    },
    num_evolutions=4
)

synthesizer = Synthesizer(evolution_config=evolution_config)
```

There are **TWO** optional parameters when creating an `EvolutionConfig`:

- [Optional] `evolutions`: a dict with `Evolution` keys and sampling probability values, specifying the distribution of data evolutions to be used. Defaulted to all `Evolution`s with equal probability.
- [Optional] `num_evolutions`: the number of evolution steps to apply to each generated input. This parameter controls the complexity and diversity of the generated dataset by iteratively refining and evolving the initial inputs. Defaulted to 1.

:::info

`Evolution` is an `ENUM` that specifies the different data evolution techniques you wish to employ to make synthetic `Golden`s more realistic. `deepeval`'s `Synthesizer` supports 7 types of evolutions, which are randomly sampled based on a defined distribution. You can apply multiple evolutions to each `Golden`, and later access the evolution sequence through the `Golden`'s additional metadata field.

If used for RAG evaluation: Note that some evolution techniques do not necessarily require that the evolved input can be answered from the context. Currently, only these 4 types of evolutions stick to the context: `Evolution.MULTICONTEXT`, `Evolution.CONCRETIZING`, `Evolution.CONSTRAINED` and `Evolution.COMPARATIVE`.

```python
from deepeval.synthesizer import Evolution

available_evolutions = {
    Evolution.REASONING: 1/7,
    Evolution.MULTICONTEXT: 1/7, # sticks to the context
    Evolution.CONCRETIZING: 1/7, # sticks to the context
    Evolution.CONSTRAINED: 1/7, # sticks to the context
    Evolution.COMPARATIVE: 1/7, # sticks to the context
    Evolution.HYPOTHETICAL: 1/7,
    Evolution.IN_BREADTH: 1/7,
}
```

:::

### Styling Options

You can customize the output style and format of any `input` and/or `expected_output` generated by instantiating the `Synthesizer` with a `StylingConfig` instance.

```python
from deepeval.synthesizer import Synthesizer
from deepeval.synthesizer.config import StylingConfig

styling_config = StylingConfig(
  input_format="Questions in English that asks for data in database.",
  expected_output_format="SQL query based on the given input",
  task="Answering text-to-SQL-related queries by querying a database and returning the results to users"
  scenario="Non-technical users trying to query a database using plain English.",
)

synthesizer = Synthesizer(styling_config=styling_config)
```

There are **FOUR** optional parameters when creating a `StylingConfig`:

- [Optional] `input_format`: a string, which specifies the desired format of the generated `input`s in the synthesized goldens. Defaulted to `None`.
- [Optional] `expected_output_format`: a string, which specifies the desired format of the generated `expected_output`s in the synthesized goldens. Defaulted to `None`.
- [Optional] `task`: a string, representing the purpose of the LLM application you're trying to evaluate are tasked with. Defaulted to `None`.
- [Optional] `scenario`: a string, representing the setting of the LLM application you're trying to evaluate are placed in. Defaulted to `None`.

The `scenario`, `task`, `input_format`, and/or `expected_output_format` parameters, if provided at all, are used to enforce the styles and formats of any generated goldens.

## How Does it Work?

`deepeval`'s `Synthesizer` generation pipeline consists of four main steps:

1. **Input Generation**: Generate synthetic goldens `input`s with or without provided contexts.
2. **Filtration**: Filter away any initial synthetic goldens that don't meet the specified generation standards.
3. **Evolution**: Evolve the filtered synthetic goldens to increase complexity and make them more realistic.
4. **Styling**: Style the output formats of the `input`s and `expected_output`s of the evolved synthetic goldens.

This generation pipeline is the same for `generate_goldens_from_docs()`, `generate_goldens_from_contexts()`, and `generate_goldens_from_scratch()`.

:::tip
There are two steps not mentioned - the context construction step and expected output generation step.

The **context construction step** [(which you can learn how it works here)](synthesizer-generate-from-docs#how-does-context-construction-work) happens before the initial generation step and the reason why the context construction step isn't mentioned is because it is only required if you're using the `generate_goldens_from_docs()` method.

As for the **expected output generation step**, it's omitted because it is a trivial one-step process that simply happens right before the final styling step.
:::

### Input Generation

In the initial **input generation** step, `input`s of goldens are generated with or without provided contexts using an LLM. Provided contexts, which can be in the form of a list of strings or a list of documents, allow generated goldens to be grounded in information presented in your knowledge base.

### Filtration

:::note
The position of this step might be a surprise to many but, the filtration step happens so early on in the pipeline because `deepeval` assumes that goldens that pass the initial filtration step will not degrade in quality upon further evolution and styling.
:::

In the **filtration** step, `input`s of generated goldens are subject to quality filtering. These synthetic `input`s are evaluated and assigned a quality score (0-1) by an LLM based on:

- **Self-containment**: The `input` is understandable and complete without needing additional external context or references.
- **Clarity**: The `input` clearly conveys its intent, specifying the requested information or action without ambiguity.

<div
  style={{
    display: "flex",
    alignItems: "center",
    justifyContent: "center",
  }}
>
  <img
    src="https://deepeval-docs.s3.amazonaws.com/generation-filtration.svg"
    style={{
      marginBottom: "10px",
      height: "auto",
      maxHeight: "800px",
    }}
  />
</div>

Any goldens that has a quality scores below the `synthetic_input_quality_threshold` will be re-generated. If the quality score still does not meet the required `synthetic_input_quality_threshold` after the allowed `max_quality_retries`, the most generation with the highest score is used. As a result, some generated `Goldens` in your final evaluation dataset may not meet the minimum input quality scores, but you will be guaranteed at least a golden regardless of its quality.

[Click here](#filtration-quality) to learn how to customize the `synthetic_input_quality_threshold` and `max_quality_retries` parameters.

### Evolution

In the **evolution** step, the `input`s of the filtered goldens are rewritten to make more complex and realistic, often times indistinguishable from human curated goldens. Each `input` is rewritten `num_evolutions` times, where each evolution is sampled from the `evolution` distribution which adds an additional layer of complexity to the rewritten `input`.

[Click here](#evolution-types-and-depth) To learn how to customize the `evolution` and `num_evolutions` parameters.

:::info

As an example, a golden might take the following evolutionary route when `num_evolutions` is set to 2 and `evolutions` is a dictionary containing `Evolution.IN_BREADTH`, `Evolution.COMPARATIVE`, and `Evolution.REASONING`, with sampling probabilities of 0.4, 0.2, and 0.4, respectively:

<div
  style={{
    display: "flex",
    alignItems: "center",
    justifyContent: "center",
  }}
>
  <img
    src="https://deepeval-docs.s3.amazonaws.com/evolutions.svg"
    style={{
      marginTop: "20px",
      marginBottom: "50px",
      height: "auto",
      maxHeight: "400px",
    }}
  />

</div>

:::

### Styling

:::tip
This might be useful to you if for example you want to generate goldens in another language, or have the `expected_output`s to be in SQL format for a text-sql use case.
:::

In the final **styling** step, the `input`s and `expected_outputs` of each golden are rewritten into the desired formats and styles if required. This can be configured by setting the `scenario`, `task`, `input_format`, and `expected_output_format` parameters, and `deepeval` will use what you have provided to style goldens tailored to your use case at the end of the generation pipeline to ensure all synthetic data makes sense to you.

[Click here](#styling-options) to learn how to customize the format and style of the synthetic `input`s and `expected_output`s being generated.
